Pruning Random Forests for Prediction on a Budget

نویسندگان

  • Feng Nan
  • Joseph Wang
  • Venkatesh Saligrama
چکیده

We propose to prune a random forest (RF) for resource-constrained prediction. We first construct a RF and then prune it to optimize expected feature cost & accuracy. We pose pruning RFs as a novel 0-1 integer program with linear constraints that encourages feature re-use. We establish total unimodularity of the constraint set to prove that the corresponding LP relaxation solves the original integer program. We then exploit connections to combinatorial optimization and develop an efficient primal-dual algorithm, scalable to large datasets. In contrast to our bottom-up approach, which benefits from good RF initialization, conventional methods are top-down acquiring features based on their utility value and is generally intractable, requiring heuristics. Empirically, our pruning algorithm outperforms existing state-of-the-art resource-constrained algorithms.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

PRUNING HEIGHT AND ITS EFFECT ON QUANTITATIVE AND QUALITATIVE SEED PRODUCTION IN OLD SAXUAL (Haloxylon aphyllum) FORESTS OF YAZD, IRAN

The high quality and quantity seed production in old saxual shrubs are essential for regeneration and sustainable development of saxual forests in desert areas. The objective of this study was to determine the effects of different pruning height on saxual seed production. The study was carried out in an obviously wilted saxual forest located in Ashkezar desert in Yazd in 1994. The experiment wa...

متن کامل

Feature-Budgeted Random Forest

We seek decision rules for prediction-time cost reduction, where complete data is available for training, but during prediction-time, each feature can only be acquired for an additional cost. We propose a novel random forest algorithm to minimize prediction error for a user-specified average feature acquisition budget. While random forests yield strong generalization performance, they do not ex...

متن کامل

Cost-Complexity Pruning of Random Forests

Random forests perform boostrap-aggregation by sampling the training samples with replacement. This enables the evaluation of out-of-bag error which serves as a internal crossvalidation mechanism. Our motivation lies in using the unsampled training samples to improve each decision tree in the ensemble. We study the effect of using the out-of-bag samples to improve the generalization error first...

متن کامل

Optimally Pruning Decision Tree Ensembles With Feature Cost

We consider the problem of learning decision rules for prediction with feature budget constraint. In particular, we are interested in pruning an ensemble of decision trees to reduce expected feature cost while maintaining high prediction accuracy for any test example. We propose a novel 0-1 integer program formulation for ensemble pruning. Our pruning formulation is general it takes any ensembl...

متن کامل

Identifying Student Behavior for Improving Online Course Performance with Machine Learning

In this study we investigate the correlation between student behavior and performance in online courses. Based on the web logs and syllabus of a course, we extract features that characterize student behavior. Using machine learning algorithms, we build models to predict performance at end of the period. Furthermore, we identify important behavior and behavior combinations in the models. The res...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016